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Abstract 

Background: Functional annotations are available only for a very small fraction of microRNAs (miRNAs) and very 
few nniRNA target genes are experimentally validated. Therefore, functional analysis of miRNA clusters has typically 
relied on computational target gene prediction followed by Gene Ontology and/or pathway analysis. These 
previous methods share the limitation that they do not consider the many-to-many-to-many tri-partite network 
topology between miRNAs, target genes, and functional annotations. Moreover, the highly false-positive nature of 
sequence-based target prediction algorithms causes propagation of annotation errors throughout the tri-partite 
network. 

Results: A new conceptual framework is proposed for functional analysis of miRNA clusters, which extends the 
conventional target gene-centric approaches to a more generalized tri-partite space. Under this framework, we 
construct miRNA-, target link-, and target gene-centric computational measures incorporating the whole tri-partite 
network topology. Each of these methods and all their possible combinations are evaluated on publicly available 
miRNA clusters and with a wide range of variations for miRNA-target gene relations. We find that the miRNA- 
centric measures outperform others in terms of the average specificity and functional homogeneity of the GO 
terms significantly enriched for each miRNA cluster. 

Conclusions: We propose novel miRNA-centric functional enrichment measures in a conceptual framework that 
connects the spaces of miRNAs, genes, and GO terms in a unified way. Our comprehensive evaluation result 
demonstrates that functional enrichment analysis of co-expressed and differentially expressed miRNA clusters can 
substantially benefit from the proposed miRNA-centric approaches. 



Background 

MicroRNAs (miRNAs) are short single stranded, non- 
coding RNAs that regulate protein-coding mRNAs [1-4]. 
Mature miRNAs cause either target mRNA degradation or 
translational repression [4] by inducing cleavage or inhibit- 
ing translation in the 3'-untranslated regions (UTRs) of the 
target mRNA [2,3]. In spite of the continuous attempts to 
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identify^ miRNAs and to elucidate their basic mechanisms 
of action, little is understood about their biological 
functions. 

Because of the regulatory role of miRNAs [5] and lack of 
direct functional annotation to miRNAs, current functional 
enrichment methods for miRNAs rely instead on their tar- 
get genes' functional annotations [6-8]. If the target genes 
of a specific miRNA are significantly enriched with a set of 
Gene Ontology (GO) terms, it is reasonable to infer that 
the miRNA is also involved in the same GO annotations. 
As only few experimentally validated targets are available, 
current methods of target gene's annotation-based 
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inference of miRNA function rely on target prediction 
algorithms such as TargetScan [9,10] and Pictar [11]. 

Many studies on miRNAs have used this "predicted tar- 
get-genes' functional annotation-based" miRNA function 
prediction strategy. Gaidatzis et al [12] appUed a log- 
lil<elihood test for functional enrichment analysis for 
KEGG pathways. Gusev [13] used hypergeometric distri- 
butions for GO and pathway-based enrichment analysis. 
Xu and Wong [14] applied hypergeometric distribution 
test to detect significant over-representation of miRNA 
cluster targets in BioCarta pathways. Similar methods 
using GO, KEGG and BioCarta pathways were imple- 
mented in miRGator [15] and SigTerms [16], applying 
hypergeometric distributions to evaluate functional 
enrichment. 

The target links from miRNAs to genes, however, show 
very uneven distributions. So do the links from genes to 
GO terms. One miRNA may regulate more than several 
hundreds of targets and one gene may be controlled by 
many miRNAs [17]. In contrast, the current methods that 
rely only on the predicted target genes' functional annota- 
tions are not powerful enough to capture such variability. 
For instance, if a certain miRNA targeting hundreds of 
genes is shared by different miRNA clusters, the clusters' 
functional annotations may become very similar even 
though they consist of very different miRNA members, 
just because they share the Very bush' one. Another lim- 
itation of the current methods is that they treat all target 
genes equally. One should differently weight genes that 
are targeted by only one member from those that are tar- 
geted by all members of a miRNA cluster. In summary, 
the current functional enrichment methods for miRNA 
cluster have limitations of not considering the tri-partite 
network topologies from miRNAs to genes to functional 
annotations regarding multiplicity and cooperativity, con- 
taining more information than simple target gene counts. 

For the purpose of illustration. Figure 1(A) and 1(B) 
exhibit example cases where the same numbers of miR- 
NAs {k - 5) from equal-sized clusters (/c = 6) are targeting 
the same numbers of target genes (/: = 6) from equal num- 
ber of genes (/: = 11) that are annotated to a specific GO 
term, GO:0030282 and GO:0051482, respectively. The 
numbers of target links between Figure 1(A) and 1(B), 
however, are differently 8 and 22, respectively. Figure 1(C) 
and 1(D) exhibit cases where the numbers of miRNAs 
connected to a specific GO term, GO:0015917 and 
GO:0030851, are differently 6 and 3, respectively, while 
the numbers of links (/c = 6) are the same. It is clearly 
demonstrated that the current approach only based on tar- 
get gene counts is unable to discern the difference in these 
targeting relations. 

The present study proposes a more generalized concep- 
tual framework to develop and analyze new functional 
enrichment measures. According to the framework, the 



traditional "predicted target-genes' functional annotation- 
based" miRNA function prediction method is regarded as 
'target gene-centric' denoted by p because it eventually 
considers only the fraction of the target genes among 
those that are annotated to a specific GO. Under the pro- 
posed framework, we derive 'target Unk-centric' (r) and 
'miRNA-centric' (^) measures, considering the numbers of 
links and miRNAs linked to a specific GO term. 

Figure 1 illustrates that while the traditional target gene- 
centric p measure cannot discern (A) and (B) {p = 
0.30325) nor (C) and (D) (p = 0.31120), the newly pro- 
posed r and fi measures successfully discern (A) and (B) (i. 
e., p = 0.62358 and p = 0.00956, respectively) and (C) and 
(D) (i.e., p = 0.00695 and p = 0.65253, respectively). It is 
clearly demonstrated that different measures calculated 
from different viewpoints significantly impact the result of 
functional enrichment analysis of miRNA clusters. We 
also propose a rank statistic for the purpose of systematic 
comparison in terms of the average specificity and func- 
tional homogeneity of the significantly enriched term for 
each GO category. Biological Process (BP), Molecular 
Functions (MF), and Cellular Components (CC). We show 
that the proposed miRNA-centric measures identify more 
specific and functionally homogenous sets of GO annota- 
tions for miRNA clusters. 

Methods 

Dataset: miRNA clusters 

We used publicly available co-expressed and differentially 
expressed miRNA clusters for comparative evaluation of 
the proposed methods. For co-expressed miRNA clusters, 
we obtained the data created by Ruepp et al. [18] that 
show correlated expression patterns across several human 
diseases. The data can be downloaded from Ruepp et al 
[18] (http://genomebiology.com/ content/supplementary/ 
gb-2010-ll-l-r6-s2.xls). Forty three among the 47 clusters 
having at least one target gene were used in this study. 
Differentially expressed miRNA sets consisting of up- or 
down-regulated genes in six solid tumors were also down- 
loaded [19]. MiRNAs down-regulated in colon cancer had 
no target gene and hence were excluded in the present 
study. Supplement Tables SI and S2 in 'Additional file 1' 
list the 54 (= 43 + (2 x 6) - 1) miRNA clusters from the 
two studies with the associated information. 

Creating variations of miRNA-mRNA target pairs for 
comprehensive evaluation 

Another input of our analysis is the target gene list of each 
miRNA that will guide the functional enrichment test 
based on the gene annotations. Considering that only a 
few experimentally validated miRNA targets are available, 
we use miRNA-mRNA target pairs obtained from compu- 
tational target prediction methods. Prediction algorithms 
generate a relatively high level of false positives [20] and 
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Figure 1 Indiscernibility example. Calculating target gene-centric (p) hypergeometric distribution cannot discern the completely different 
targeting topologies between (A) and (B) and between (C) and (D), resulting the same p-values (p = 0.30325 and 0.31120), respectively. The 
target link-centric (t p-values can discriminate (A) and (B) (i.e., p = 0.62358 and 0.00956, respectively) and the miRNA-centric p-values can 
discriminate (C) and (D) (i.e., p = 0.00695 and 0.65253, respectively). *p < 0.05, hypergeometric test. 



the degree of overlap between predicted targets from dif- relations for comprehensive evaluation. We used miRe- 

ferent methods is often poor or null [21]. Given the lack cords [22] and miRGen [23], which are integrated 

of 'gold standard' for miRNA and target gene pairs, we resources of miRNA-target interactions from 11 estab- 

consider a wide range of variations in miRNA-gene pair lished target prediction algorithms and from four most 
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widely used target prediction programs, respectively. We 
created 21 variations for predicted target pairs by consid- 
ering the number of positive voters from the included 
algorithms by miRecords (Table 1, upper panel) and six 
variations by applying the four programs of miRGen 
(Table 1, lower panel). Because most of the evaluation 
results from these variations were largely comparable, the 
most representative variation #6 in Table 1 was used to 
describe the overall study results in the following sections. 
Variation #6 was created by applying the 11 algorithms 
provided by miRecords, wining more than three positive 
voters and resulting in 1,569,741 target links from 553 
miRNAs to 17,636 genes. As the number of required posi- 
tive voters is increasing, the numbers of miRNAs, links 
and genes are decreasing as can be seen in Table 1. 

Target gene-, target relation-, and miRNA-centric 
calculations of hypergeometric distributions 

Now we describe the details of the proposed measures 
in a proposed conceptual framework. Suppose we want 



to test the functional enrichment of a miRNA cluster 
with respect to a specific GO term (or annotation). 
In most previous approaches, one first constructs a cor- 
responding target gene cluster consisting of all the 
genes targeted by at least one member in the miRNA 
cluster. Then the numbers of target genes annotated (p/) 
and not annotated (py) by the GO term are used in the 
two by two contingency table along with the numbers of 
genes not in the target cluster and are either annotated 
(p/^) or not annotated (p/) with the term, as shown in 
Figure 2(B). Functional enrichment is tested from this 
contingency table using a hypergeometric distribution. 
These traditional target gene-centric (p) methods are 
limited in that they consider only the fraction of target 
genes connected to a specific annotation for each anno- 
tation [12-14], as already illustrated in Figure 1. To this 
rather confusing problem, the diagram and contingency 
tables in Figure 2 provide a conceptual framework to 
understand and correctly design new functional enrich- 
ment measures. The diagram of miRNA, gene and 



Table 1 Variation for predicted miRNA-gene target pairs 


Index 


No. of algorithms showing positive voting 


Numbers of 












miRNAs 


Target links 


Genes 


miRecords (Xiao et al., 2009) 


#1 


3 algorithms 


553 


1,234,390 


1 7,602 


#2 


4 




535 


272,505 


15,278 


#3 


5 




407 


53,041 


9,747 


#4 


6 




159 


9,691 


2,783 


#5 


7 




29 


68 


66 


#6 


3 ~ 


11 


553 


1,569741 


17,636 


#7 


4 ^ 


11 


535 


335,351 


15,422 


#8 


5 ~ 


11 


408 


62,846 


9,851 


#9 


6 ~ 


11 


159 


9,805 


2,816 


#10 


7 ~ 


11 


40 


114 


104 


#11 


3 ~ 


1 1 including DIANA-microT 


0 


0 


0 


#12 


3 ~ 


1 1 including Microinspector 


56 


184 


160 


#13 


3 ~ 


1 1 including miRanda 


552 


1,416,379 


1 7,584 


#14 


3 ~ 


1 1 including mirtarget2 


530 


184,544 


13,841 


#15 


3 ~ 


1 1 including miTarget 


0 


0 


0 


#16 


3 ~ 


1 1 including NBmiRTar 


42 


201 


172 


#17 


3 ~ 


1 1 including PicTar 


163 


64,658 


6,515 


#18 


3 ~ 


1 1 including pita 


551 


1,559,586 


16,676 


#19 


3 ~ 


1 1 including rna22 


54 


232 


197 


#20 


3 ~ 


1 1 including mahybrid 


552 


1,548,423 


1 7,630 


#21 


3 ~ 


1 1 including TargetScan 


412 


343,190 


16,127 


miRGen [23] 


#22 


DIANA-microT 


175 


1,816 


1,206 


#23 


miRanda (microrna.org) 


469 


430,878 


16,699 


#24 


miRanda (miRBase) 


156 


38,821 


5,444 


#25 


PicTar (4-way) 


177 


68,100 


6,391 


#26 


PicTar (5-way) 


128 


22,028 


2,433 


#27 


TargetScanS 


237 


75,044 


7,546 
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Figure 2 Framework for developing three types of miRNA functional enrichment measures. A conceptual framework is constructed to 
consider tine tri-partite networl< topology. (A) A miRNA cluster under investigation contains the members, ii-, and ^j, targeting genes that are 
associated (p/) and not associated {p) with a specific GO term of interest through t/ and Xj, respectively. Non-member miRNAs may be associated 
ilJk-^pk) or not {jJi^pi) with the GO term through and t/. Counts for (D) miRNA-centric {jj) and (C) target link-centric (t) as well as (B) target 
gene-centric (p) are listed by two-by-two contingency tables. The closed and broken circles in the miRNA world depict the miRNA cluster under 
investigation and the subset miRNAs targeting the genes that are associated with a specific GO term of interest. 



annotation worlds in Figure 2(A) depicts the tri-partite 
network topology between the three worlds such that 
one can drive the quartet numbers to create contingency 
tables for miRNA-centric (r) and target link-centric (fi) 
as well as for the target gene-centric (p) measures 
(Figure 2(B)~(D)). 

Under this conceptual framework in Figure 2, sub- 
scripts / and k represent positive and subscripts j and / 
negative connections to the GO term. Subscripts / and j 
represents connections from inside of and k and / from 
outside of the targeting miRNA or target gene clusters. 
The traditional pi and pp for example, correspond to the 
sets of target genes that are annotated (pi) and not 
annotated (pj) to a specific GO term, p/^ and pi denote 
non-targeted genes that are annotated (p/J and not 
annotated (p/) to the GO term. We can develop a 
miRNA-centric measure in the conceptualized three fra- 
mework in a consistent way. We define ^/ and f^j as the 
miRNAs in the cluster whose target genes are annotated 



i/Ui) and not annotated {/Uj) to the GO term. As in the 
case of a gene-centric measure, f^/^ and correspond to 
miRNAs outside of the cluster whose target genes are 
annotated (fi/^) and not annotated (///J to the GO term. 
Similarly, for a target link-centric measure, we define ii 
and Tj as the target links connecting members of the 
miRNA cluster in and in //y, respectively, to genes 
that are connected (p/) and not connected (py) to a spe- 
cific GO term. Remaining miRNAs outside the cluster, 
fi/^ and fii, target genes through r/^ and r/ that are headed 
to genes that are connected (p/^) and not connected (p/) 
to the GO term. 

To formally define the three measures, let p, r, and fi 
be the random variables that represent the number of 
target genes, target links, miRNAs, respectively, which 
are linked to a specific GO term as explained above. 
The following three equations, (1), (2), and (3), 
describe the hypergeometric distributions of p, r, and 
fi, respectively. 
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probability{p = Pi) = 



Pi + Pk 
Pi 



Pj + Pi 
Pj 



Pi + Pj + Pk + Pi 
Pi + Pj 



(1) 



prohahility{T = r/) 



Ti + Tk 



Tj + Tl 



/ Ti + Tj + Tk + Tl\ 
\ ^i + J 



(2) 



probability {fi = fii) 



Mi 



lli + /Xj + flk + f^i 

fii + fij 



(3) 



Note that for notational convenience, we now used 
To, i^a for a G j, k, /}, instead of |p^|, etc., to represent 
the number of members in the corresponding set by 
abuse of notation. The /^-value for the enrichment test 
from hypergeometric distribution of the random variable 
p is calculated from the cumulative probability of obser- 
ving at least p/ out of pi + py times. Accordingly, the 
/7-value from each of the three measures can be defined 
as follows; 

p — value p = probability [p > pi) 
p — valuer = probability [t > r/) 

p — value^ = probability [fi > fii) 

These probabilities are computed using the phyper 
and dhyper functions in R 'stats' package. 

Combining P-values 

For the purpose of comprehensive evaluation, we create all 
possible combinations of the three measures and tested 
each of those at all GO categories and using different 
miRNA-target gene pair sets. Figure 3 illustrates steps of 
combining the three types of hypergeometric distributions 
for p, r and For each of the 54 miRNA clusters, of the 
27 variations for miRNA-target gene pairs, of the three 
GO categories, and of annotations (or GO terms), three 
j;>-values, pp, p-^ and p^, are first computed. Then, we gen- 
erate 4 combined jE?-values by using Fisher's combined 
/7-value method [24]. 



Pp,T 
Pp,T,fl 



combined p - value of pp and p^ 
combined p - value of pp and 
a : combined p - value of p^ and 
: combined p - value of pp, p^ and p^ 



INPUT: R = {x\x^ miRNA cluster} 



For each of 27 miRNA-gene pair variations 



For each GO category (i.e. BP, CC and MF] 



For each term t in the GO category 
Calculate three hypergeometric distributions 
Return p-valuep(t], p-valuerCt], and p-value^^ [t] 



For each term t in the GO category 
Calculate four combined p-values 
Return p-valuep,T(t], p-va\uep,^{t), 

p-valueT,/^[t], and p-va\uep,T^^[t) 



Rank transform [«=1,...,100] 



5(«)= 

S («) = 
S («) = 
S («) = 
S («) = 

p,r,/u^ ^ 



rank(p - value ^{t)) < n} 
rank{p - value^{t)) < n) 
mnk(p - value (t)) < n} 
rank(p - value ^ ^ (t)) < n} 
rank(p - value ^^(t)) < n} 
rank{p - value^ ^ {t)) < n) 
rank{p - value ^ ^ ^ {t)) < n} 



Figure 3 Steps for combining three types of p-values. For a 

selected GO category and a miRNA-gene target-pair variation, for 
eacli GO term, tliree p-values are computed for p, t, and and 
then rank normalized. Sp(n) denotes the set of GO terms whose p- 
values' ranks in the p hypergeometric distribution are less than or 
equal to n. By applying set operations, four combinations of Sp(n), 
5r(n), 5u(n) are created for further evaluation. 



We briefly describe how Fisher's combined P'VdXwe 
method can be applied to our proposed measures. 
Under the null hypothesis of no significant enrichment, 
the individual /7-value for the random variable p, r, or 
follows the uniform distribution on 0[1]. Then the dis- 
tribution of 

Y = —ln{p- value) 

is chi-squared with one degree of freedom. We have 
three p-vdlues from p, r, and fi hypergeometric distribu- 
tions, 

pp^pr and p^, 
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and thus we define 

Yp = -In {pp) , = -In {pr) , and = -In (p^) 

Each of the random variables Yp, Y^, and Y^ is under 
the chi-squared distribution with one degree of freedom. 
The final four sums of W are then defined as follows; 

Wi = Y^ + Y, 
W2 = Y^ + Y^ 
W3= Y,+Y^ 
W4= Y^ + Y,+Y^ 

The random variables Wx, W4 follow chi-squared dis- 
tribution with degrees of freedom 2, 2, 2, and 3, respec- 
tively. These random variables are used to produce the 
combined overall' j?-values. To calculate these /^-values, we 
applied fisherSum function in R 'MADAM' package [25]. 

The underlying distribution of p-values from each 
method can be different due to the different characteris- 
tics of the measure. To take into account this heteroge- 
neity in the distribution of /^-values, we rank-normalized 
/^-values for each GO category as shown in the last step 
of Figure 3. Specifically, we construct the set SQ(n) of top 
n significant GO terms having the smallest /7-values for 
each measure Q g {p, z, Four additional sets of Sp^Jjt)) 
Sp^^{n), S-^^^{n), and 5'p^r,^(^) for the combined measures 
are also created and used for further evaluation. 

Evaluation measures 

Average specificities and functional homogeneity index 
(or semantic similarity density) of the rank normalized 
term sets S^in) for each measure 6 e {p, z, f4j{p, r), (p, (a), 
[z, [p, z, 1^)} are computed for performance compari- 
son. This is based on the general assumption that for a 
specific set of GO terms identified by each measure, the 
more functionally homogenous the set is, the more reli- 
able the measure is. In addition, higher specificities are 
more desirable because it is more informative to have 
more specific terms than more general terms in the 
functional analysis of clusters. 

Many studies have shown that Information Content 
(IC) can quantify the specificity of a cluster [26,27]. IC 
measure is based on the fact that less frequently used 
terms are more specific. The IC of a GO term t is 
defined as follows: 



7C(t) = - 




where root represents the root term for each GO cate- 
gory, ^e^(^) is defined as follows; 

freq (t) = n {annotate (t)) + ^ n {annotate (c)) 

cechildren{t) 



where children{t) returns the list of child terms of 
term t. Thus t becomes a parent term of all members of 
children t), either directly or indirectly. The functions 
annotate{t) and n{G) return the list of genes that are 
annotated to GO term t and the number of the genes in 
the gene list G, respectively. We use the average IC 
value of the given term set as a performance measure to 
compare the specificity. 

For functional homogeneity index (or semantic similar- 
ity density), we choose a widely used Resnik's measure of 
semantic similarity [28]. The semantic similarity between 
two terms is defined as the IC of the lowest common 
ancestor (LCA) of the two terms and hence is obtained by: 

SResnik {tA, tB) = IC {LCA {tA, t^)) (6) 

As an evaluation measure, the average of all pairwise 
term-to-term Resnik's similarities was applied for So(n) 
for each measure 6 e {p, z, (p, r), [p, fd), {z, fd,), {p, z, fd)} 
and defined as semantic similarity density of the set. 

GO terms and associated gene sets were downloaded 
from http://www.geneontology.org/gene-associations/ 
gene_association.goa_human.gz. We excluded GO asso- 
ciations having ND (No biological data) or NR (Not 
Recorded) evidence codes. 

Results 

Average specificity and functional homogeneity index 
distributions 

Figure 4 shows the distributions of average IC values and 
functional homogeneity index for GO BP terms with p- 
values in top n = 100 ranks in the 'breast/up-regulated 
miRNA cluster' from VoUnia et al [19] (Supplementary 
Table S2 in 'Additional file 1'). Most of the highest aver- 
age IC and functional homogeneity values were obtained 
by miRNA-centric measures throughout the evalua- 
tions (see Supplement Fig. SI series in 'Additional file 1') 
including the specific example shown in Figure 4. 
Because of the small numbers of miRNA members and 
target genes, target variations #5, #10, #11, #15, and #16 
in Table 1 had no significant GO terms. Evaluation 
showed that miRNA-centric measure exhibited the best 
specificity and homogeneity except only for the target 
variations #12, #19 and #22. The very small numbers of 
miRNAs (i.e., m = 56, 54, 175, respectively) and target 
genes (i.e., m = 160, 197, 1206, respectively) from the 
very strict thresholds may explain the results. These find- 
ings are also consistent throughout the evaluation study 
regardless of different GO categories. 

Performance comparison with a varying parameter 
setting 

Figure 5(A) and 5(B) shows the distributions of the 
average IC values and functional homogeneity values 
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Figure 4 Evaluation of functional enrichment measures and their combinations. Distributions of (A) functional lionnogeneity index (or 
average IC value) and (B) semantic similarity (or average all pair-wise Resnik's similarity) are exhibited for significantly enriched GO BP terms in 
the 'breast/up-regulated miRNA cluster' from Volinia et ol. [19](see index 1 in Supplement Table S2) by applying target variation #6 in Table 1. 
MicroRNA-centric measure {[j) outperforms the traditional target gene-centric measure (p) and others. 
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Figure 5 Evaluation of functional homogeneity and semantic similarity densities across different thresholds. Average (A) information 
content and (B) all pair-wise semantic similarity values are plotted with increasing numbers of rank normalized GO terms n (see Fig. 3) for 
"breast/up-regulated miRNA cluster" from Volinia et ol. [19] (index 1 in Supplementary Table SI in 'Additional file 1') by applying target variation 
#6 in Table 1, GO BP category. Measures containing miRNA-centric jj (in blue) like {p, jJ) (in pink) and {t, jj) (in sky blue) consistently outperform 
traditional gene-centric p (in red) measures at all levels. 



Lee et al. BMC Genomics 2012, 13(Suppl 7):S17 
http://www.biomedcentral.eom/1 471 -21 64/1 3/S7/S1 7 



Page 1 0 of 1 3 



with increasing numbers of rank normalized GO terms 
n (see Figure 3), as an example for "breast/up-regulated 
miRNA cluster" from Volinia et al. [19] (index 1 in 
Supplementary Table SI in 'Additional file 1') by apply- 
ing target variation #6 in Table 1, GO BP category. 
Measures containing miRNA-centric (in blue cross) 
like {p, [i) and (r, consistently outperformed tradi- 
tional gene-centric p (in red circle) at all threshold levels 
of n. Figure 6 demonstrates the distribution of /7-values 
for all GO BP terms annotated to the miRNA clusters 
from the dataset of Volinia et al, [19]. Although the 
interpretation about the p-value distribution is generally 
tricky and needs to be done carefully, it seems that the 
j;>-value distribution for miRNA-centric {i (in green) 
shows overall better discriminant power than target 
link-centric r (in blue) and traditional gene-centric p (in 
red) methods. 

Examples showing complementary properties 

Examples of GO terms determined to be statistically sig- 
nificant by miRNA-centric {i but not by traditional gene- 
centric p method are listed in the upper part of Table 2. 
Gusev [13] correctly pointed out that it was common for 
top ranked GO terms to be targeted by every member of 
the corresponding miRNA cluster. Those that are tar- 
geted by all six miRNA members (i.e., {ii - 6) shown in 
the upper part of Table 2, however, are not statistically 
significant [p > 0.05) and show poor ranks (>290) by p 
method. But ^ method shows statistical significances {p < 



0.05) with high ranks (<35) (Table 2). In contrast, those 
that are targeted by all six miRNA members shown in 
the middle part of Table 2 show very strong statistical 
significance {p < 0.001) by p method. The very low fi/^ to 
fii ratios (i.e., about 50:1) in the middle part compared to 
those in the upper part (i.e., about 1:1) of Table 2 clearly 
explain the poor /^-values and ranks (>2500) by 
method. Therefore, Gusev's correct intuition can further 
be formally analyzed by introducing miRNA-centric fi 
method. It is demonstrated that our new measure consid- 
ering fi complements some drawbacks of the traditional 
gene-centric p measure. 

The GO terms in the lower part of Table 2 are anno- 
tated only to two to five among six mRNA members 
such that they are far from statistical significance by p 
calculations. The /7-values by f4- method, however, are 
even more statistically significant. Complement activation 
(GO:0006956) in GO BP category was rejected by the tra- 
ditional p method {p = 0.42) but accepted by miRNA- 
centric fi method {p > 0.001) with ranks of 1251 and 1, 
respectively. Complement activation indeed has long 
been well recognized in breast cancer [29,30]. At least 
four well-known breast cancer genes including SMAD2, 
SMAD4, TGFB3 and TGFBR3 are involved in palate 
development. There are many studies reporting that regu- 
lation of growth hormone secretion (GO:0060123) is 
indeed associated with breast cancer [31-33]. For the GO 
term, negative regulation of activin receptor signaling 
pathway (GO:0032926), many studies reported that 
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Table 2 Comparison of miRNA-centric ju and gene-centric p measures^ 
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^ The 'breast/up-regulated' miRNA cluster data from Volinia et al. (2006) using the target variation #6 (see Table 1) was used. 



facilitating activin signaling either by Cripto silencing or 
FLRG silencing inhibits human breast cancer cell growth 
[34,35]. Numerous studies have reported that acetyl-CoA 
carboxylase (ACCa) and fatty acid synthase (FAS), key 
limiting fatty acid synthesis enzymes involved in coen- 
zyme A metabolic process (GO:0015936), are highly 
expressed in human breast cancer cell lines and breast 
carcinomas [36-40]. Moreover, pantothenate kinase 3 
(PANK3) and Coenzyme A synthase (COASY) are 
known breast cancer genes. 

Discussion 

We proposed miRNA-centric (^ and target link-centric r 
measures that improve functional enrichment analysis of 
differentially expressed or co-expressed miRNA clusters. 
We performed comprehensive evaluations of different 
methods on various settings. It is demonstrated that these 
new measures complement the conventional target gene- 
centric p measure and miRNA-centric |^ method was 
among the most powerful and reliable. 

MicroRNA's intrinsic properties of multiplicity and 
cooperativity [17] may be correctly modeled by combined 
hypergeometric distributions. Average IC value for the 
category was consistently the highest among different con- 
ditions and measures. It is suggested that the number of 
miRNAs and their relations associated with a specific GO 



term of interest is as much important as the number of tar- 
get mRNAs associated with the GO term. Therefore, 
applying p, r, and hypergeometric distributions for func- 
tional annotation of miRNAs are mutually complementary. 

The proposed method is based on computationally pre- 
dicted rather than experimentally validated target rela- 
tions. Computational prediction has limitations given high 
level of false positives and negatives. Especially, it is diffi- 
cult to obtain predicted targets for minor forms of miRNA 
such as star, -3p, -5p or other recently identified forms of 
miRNAs. All current computational enrichment analysis 
methods that use predicted target relations suffer from the 
same drawback. Combining the proposed three methods 
may complement with each other in finding and evaluat- 
ing the correct miRNA-mRNA target relations, and 
improving fiinctional annotations and enrichment analysis. 

Additional material 



Additional file 1: Supplementary Figures and Tables. This file 
contains additional figures and tables mentioned in the main text. 
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